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Introductory Statement 



ment ^TeacM™ ® is ® ion of the Stanford Center for Research and Develop- 
A . . i “ g 1S t0 contribute to the improvement of teaching in 

“• G1Ven the urgenc y of the times , technological develop- 
ing/^ ad Y ances u ln knowledge from the behavioral sciences about teach- 

rcfomulatiSI of ’t^% / t6r W ? rksan the assumption that a fundamental 
C r n r/ t 0n / f / fU ® r0le of the teacher will take place. The 

possible “£he d?roLZ SP J Ci 5 y 38 clearl y» and <» as empirical a basis as 
L, 1 * he dlrec tion of that reformulation, to help shape it, to fashion 

^h it aid ll°llT lo f0r / aining and re training teachers'in accordant 

S training pJogrIL P " at6rlalS *” d *« — *» th.se 

faS J h \ C T e l iS at WOrk in three interrelated problem areas: 

teinl l ~ ii S ? C T f achlng » whlch aims at promoting self-motivated and sus- 
tained inquiry in students, emphasizes affective as well as cognitive 

processes, and places a high premium upon the uniqueness of each pupil 

S? lear " ln ? situation; (b) The EnvlreJae for iJ^ Thich 
aims at making schools more flexible so that pupils, teachers and leam- 

i2v m d-I/ alS Can be /5°y ght together in ways that take account of Jheir 
aw f/! f enCaS ’* a “ d , (c) Teaching S tudents from Low-Income Areas , which 
itZ w J ! whether more heuristically oriented teachers and more 

open kind* of schools can and should be developed to improve the education 
of those currently labled as the poor and the disadvantaged. 

Nn 7 ? e /f/ d0 J° g y Unit developed Research and Development Memorandum 
wkich fol lows, to deal with the problem of comparing proportions 
where some cases are missing. Such nonresponse problems are frequently 
encountered in the analysis of data gathered by Center projects. 
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Two-sample problems with dichotomous data are considered; some 
specific probability models are developed to describe which observations 
are missing and why; and the statistical techniques appropriate under 
each of the models are discussed. 
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MISSING DATA PROBLEMS FOR TWO SAMPLES 
ON A DICHOTOMOUS VARIABLE 

Janet Dixon Elashoff and Robert M. Elashoff 1 
1. Introduction 

Incomplete or missing data is a major problem in many fields. Data 
may be incomplete because of nonresponse, random loss, transcription 
errors, refusal to cooperate, and a variety of other reasons. In these 
instances, statistical techniques to deal with the incomplete data are 
necessary. One possibility is simply to delete and ignore the incomplete 
cases. To select the appropriate technique, however, some facts must 
be known about the kind of observations which are missing and which 
variables influence the loss of certain observations. 

In this study two— sample problems with dichotomous data are con- 
sidered; some specific probability models are developed to describe 
which observations are missing and why; and the statistical techniques 
appropriate under each of the models are discussed. Using techniques 
which assume that observations are missing at random may be extremely 
misleading. If the probability model governing the occurrence of missing 
data is complex, the only adequate solution may be to "find out what the 
missing observations are." 

Section 2 discusses four probability models for the occurrence 
of missing observations. Section 3 introduces notation and lists the 
estimation and testing problems to be discussed. The succeeding three 
sections derive solutions under each of the first three probability 

^Janet D. Elashoff is Assistant Professor of Education at Stanford 
University and a Research and Development Associate at SCRDT; Robert M. 
Elashoff is Associate Professor of Biostatistics at the University of 
California, San Francisco. 
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models proposed, while Section 7 indicates how headway might be made 
under Model 4. Then in Sections 8, 9, 10, and 11 the Model 1 and 
Model 3 estimators are compared using asymptotic and small sample 
results. Section 12 contains recommendations about procedures to use 
for each of the estimation and testing problems discussed and problems 
for further research. 

2. Probability Models for Incomplete Data 

This section discusses four general probability models proposed 
in the statistics literature to account for the occurrence of missing 
data. 

Assume that one independent variable x and one dependent 
variable y are under study for each individual. Further assume that: 
(1) no x observations are missing, (2) for each value of x occurring 
in the study, a random sample of individuals is drawn, and n^ 

individuals are observed on y and N - n individuals are not 
observed on y (their y values are "missing" and so unknown) , (3) no 
other variables have been measured. 

Define 

q(x,y) = Pr (an individual's y is observed|x,y) . 

In other words, among individuals with values x and y of the 
independent and dependent variables , the probability that the value of 
the dependent variable is not observed is l-q(x,y) . Thus, the loss of 
particular observations may be influenced by the actual values of the 
dependent and independent variables. 



_Model 1: Randomly Missing Data 

It is commonly assumed that missing observations have occurred at 
random or by chance. That is, neither the value of x nor the value 
of y influences whether an individual's y value is observed or not. 
Thus the random model states that q(x,y) , the probability that an 

individual's y value is observed, is independent of both x and y , 
or 

q(x,y) = q for all x and y . 

The random model is appropriate where factors completely independent 
of the variables under study are causing missing data or where a question 
y is asked of a random subsample of individuals surveyed. 

The random model is the basis for the frequent practice of "ignoring" 
missxng data, that is, analyzing only complete observations. The practice 
of ignoring missing data is appropriate if the random model holds, other- 
wise it may give misleading results (see Sections 8, 9, and 10). 

M° del 2; Independent Variable Influences Missing Data 

Model 2 states that q(x,y) , the probability that an individual's 
y value is observed, is dependent on x but independent of the value 
of y , or 

<l(x,y) = q for all y . 

For example, suppose computer-assisted instruction is compared with 
a conventional teaching method. Let x denote the teaching method. A 
sample of N x students is taught by method x , and each student attains 
a final score of y on material learned. Due to computer breakdowns 
final scores y are missing for some students. In this example, the 
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independent variable, teaching method, but not the dependent variable, 
final score, influences the probability that an observation is missing. 

Model 3: Dependent Variable y Influences Missing Data 

Model 3 states that q(x,y) depends on the value of the dependent 
variable y but is independent of the value of the independent variable 
x 

q(x,y) = q y for all x . 

For example, suppose patients with a certain disease are assigned 
either an active drug or a placebo x in a double blind study. The 
placebo has the same side effects as the active drug, but presumably it 
does not have the same curative or palliative effect as the active drug. 

A follow-up study is made and each patient is scored as improved or 
unimproved y . Lack of improvement may cause some patients to drop out 
of the study or refuse to cooperate further. Improvement also may give 
patients a reason to drop out or a chance to leave the area. In both 
cases the y measurements are unknown. Clearly, in these circumstances, 
missing y's may be influenced by whether or not the patient is improved 
but not directly by the drug the patient received. 

Model 4: The Values of Both the Dependent and Independent 

Variable Influence Missing Data 

Model 4 states that q(x,y) depends on the value of the dependent 
variable y and the value of the independent variable x . Both an 
individual's y value and his x value affect the probability that 
his y value will be observed. 
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Suppose, for example, that a prospective panel study is undertaken 
to investigate differences in employment status y between the sexes x 
in New England over a ten-year period. Some people will be lost to 
follow-up in the course of the study because of emigration from the 
region. Clearly employment status is one factor influencing emigration— 
thus, employment status y influences whether an individual’s employment 
status is observed. Furthermore, the sexes have differential mobility, 
so the independent variable x also influences whether an individual’s 
employment status is observed or not. 

— — Two-Sampl e Problems for v Dichotomous 

This section outlines five statistical problems involving the 
comparison of two independent proportions [problems (a) through (e) 

below] and presents the notation used in describing samples with missing 
data. 

Let P± be the probability that y equals one in population i 
P 1 =Pr(y=l|x=i). 

The five statistical problems to be discussed are: 

(a) To estimate p^ for population i . 

(b) To estimate the difference d = p, - p 

v 2 

(c) To estimate the ratio R = P 1 /p 2 . 

(d) To estimate the odds ratio OR = — -2— 

p 2 0- - Pi) * 

(e) To test H Q : Pl - p 2 against the alternative ^ * Pl t P 2 . 

Random samples of and N 2 individuals are selected from the 

two infinite populations denoted by x = 1 and x » 2 . Suppose that 






>7 

.'••7 



, ,v v 



o 

ERIC 



12 



6 



n i individuals are actually observed from each sample, ^ <_ ^ (i = i,2) 
so that N a - n i observations are missing from each sample. Let r^ 
be the number of individuals for whom y ® 1 out of the n^ actually 
observed in population i ; r^ = - r ± . Let u ± be the number of 

individuals with y = 1 in the N ± - n ± individuals who weren't 
observed; u^ = - n^ - u^ . The number of missing observations 

is known but u^ is not known. This notation is summarized 
in Table 1. 



TABLE 1 
Notation 



Population 

X 


Value of 

y 


P(y|x) 


q(x,y) 


Actual number 
in the sample 


Observed number 
in the sample 


1 


1 


p i 


q(i» l) 


r l + u l 


r l 


1 


0 


i-Pi 


q(i»o) 


r i + u i 


r i 


1 


Totals 






N i 


n l 


2 


1 


p 2 


q(2,i) 


r 2 + u 2 


r 2 


2 


0 


1-p 2 


q(2,0) 


r 2 + u 2 


r' 

r 2 


2 


Totals 






N 2 


n 2 


Notice 


it is assumed that 


it is not 


feasible to make 


further efforts 



to obtain the y-values for individuals whose y-values are missing. Call- 
backs will not be carried out and further data on other measured variables 
will not enable us to obtain "good" predicted values of y . These strin- 
i' gent restrictions are relaxed only in the discussion of Model 4. 

i 

- I: 

» ?• 

o 

ERIC 
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— — Randomly Missing Dat a: Statistical Techniques for 

Problems (a) Through (e) Under Model 1 

When Model 1 is correct and missing observations occur at random, 
the N x - n^ and N 2 - n 2 missing observations are ignored and the 
remaining observations are regarded as random samples of size ^ and 
n 2 respectively. Standard statistical techniques are applied to these 
random samples. The maximum likelihood (ML) estimator of p ± under 
Model 1 is p lt = r 1 /n 1 and the ML estimators of d , R , and OR are 
obtained by substituting $ u for p. in each of these expressions. 

The conditional and unconditional means and variances of the estimators 
of Pj » d , R , and OR are given in Tables 2, 3, and 4. 

Alternative estimators for R and OR or simple functions of these 
quantities have been derived and studied under Model 1. For example, 
Haldane (1955) and Anscombe (1956) recommend that log OR should be 
estimated by substituting p ± + (1/2^) for and [ (1-^) + (1/2^)] 
for (1-f^) in the expression OR^^ to reduce bias (see Table 2). 

Since the primary focus of this study is comparison of estimators under 
different models for the missing data, such modifications were not 
investigated. For the conditional mean of an estimator the expectation 
of the estimator is taken conditional upon the observed n^ ; the uncon- 
ditional mean is not conditioned upon the n ± . In the development of 
the asymptotic means and variances it is assumed that 

(1) = lim i^/N^ > 0 

(2) X “ lim 1^/ (^ + N 2 ) > 0 . 

Occasionally, = X and X £ = (1-X) will be used. 
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The statistics p^ , , and OR^ have asymptotic normal 

distributions conditionally and unconditionally with the means and 
variances shown in Tables 2, 3, and 4. 



TABLE 2 



Conditional and Unconditional Means (Assuming Model 1) 



Estimator 


Mean 




Pli 


p i 




a i " Pi - P 2 


P 1 - p 2 




K - h'h 


Pl /p 2 


[asymptotic] 


o\ = p 1 (i-^ 2 )/p 2 (i-^ 1 ) 


p l ( 1- P2> / p 2 ( 1_p i) 


[asymptotic] 




TABLE 3 




Asymptotic Conditional Variance Under Model 1 





Estimator 



Variance 



/s r nr ihi 









+ n 2 OR. 



Vi 

P 1 (1 " P 1 ) p 2 (1 ’ p 2 ) 
Xtj + (l-X)T, 



[exact] 



[exact] 



(P 2 )‘ 






Pl (1 ~ p_l > + Pi) 2 p 2 (1 ■ P2> 



T l x 



P 2 J T 2 ^ ^ 






p l (1 - p 2> I 1 - p 2 



1 



p 2 (1-p,) 2 | (1 - P 1>V + P 2 T 2< 2 ' Wj 
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Estimator 



TABLE 4 

Asymptot ic Unconditional Variance Under Model 1 

Variance 



* / N 1 + N p 



2 F li 



P A (1-Pi) 



\ + n 2 d 1 



+ N 2 






qX(l-X) [(1 “ X )P 1 (1 _ Pi> + ^C 1 " P 2 >] 
tP 2 tt - Pl ) - X(p 2 - Pl)] 

^l,V [P 2 (1 - p 2><l-A) + P 1 (1-P 1 )X] 



A test of the H Q : P;L = p 2 against one or two-sided alternatives 
may be carried out using Fisher's exact test. Naturally, the power of 

the test based on sample sizes n ± will be less than that based on 
sample sizes N . 

— — The Independent Variable Influences Missing Data (Model 2): 
Statistical Tec hniques for Problems (a) Through (e) 

In this model the probability of observing the particular y score 
for a particular individual is independent of the value of y but does 
depend on the population sampled. The estimators defined under Model 1 
for P A > d , R , and OR are also the ML estimators assuming Model 2, 
and they have the same conditional means and variances under Model 2 as 
under Model 1 (see Tables 2 and 3) . Moreover, the asymptotic uncondi- 
tional means are also the same. However, the unconditional variances 

16 
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under Model 2 are different from those under Model 1 (see Table 5). 

TABLE 5 

Asymptotic Unconditional Variance Under Model 2 



It is possible to test whether Model 1 or Model 2 applies in a 
particular problem. The null hypothesis is Hq : q^ = q for x = 1,2 ; 
the alternative hypothesis is % : ^ q 2 * fisher's Exact Test may 

be used to carry out a test conditional on the and (n^ + i^) . 

To test Hq : p^ = p^ against one or two-sided alternatives use the 
same tests as if Model 1 obtains. 

6. The Dependent Variable Influences Missing Data (Model 3): 



Estimator 



Variance 




P-l/l ” P ± ) 

Vi 






?1 (1 ~ V ^ P 2 (1 " ?2> 



Xqj + (1 - X)q 2 






PjCI-Pj) p?(i-p,) 







P l (1 " V 2 ) 



l v v 2 



lP 2 (l-P 2 )q 2 (l-X) + P l^ 1-P l^ q l^ 



\ (i-x)p2 d-Pj^) 3 q.,q 2 



Statistical Techniques for Problems (a) Through (e) 



Under Model 3, the value of the dependent variable y influences 
the probability that an individual's y value will be observed. The 
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independent variable x does not influence the probability of a missing 
observation. Therefore 



(3) 



<lO-,y) = q(2,y) * q^ for y = 0,1 . 



The maximum likelihood equations for Model 3 have quadratic and cross 
product terms in the p's and q's . For example 

31nL 



aq, 



V r l + (n 2~ r 2 } + ^ -(N 2 -n 2 ) (l-p 2 ) 

“ ” 0 • 



l o 1 -Piqr( 1 -p 1 )q 0 1 -p 2 qi-( 1 -p 2 )qo 



Consequently, simple estimators are of interest. Eklund (1959) argues 
that if there were no missing observations, the might be estimated 
by P ± - (r ± + u^/l^ . Therefore, estimating the q's as 



q(i,l) = 



(4) 



r i + u i 



' (i,0) = 



and using relationship (3) yields equations 



(5) 



r l + U 1 r 2 + u 2 



r i + u i r 2 + u 2 * 



Solving for u ± and \i' ± yields estimates 



U 1 “ r l 



N r' - N r' 
2 r l W l r 2 



r l r 2 “ r 2 r l 



- 1 



( 6 ) 



A 2 /V 

2 r 1 U 1 ’ 
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This leads to estimating the q 



and as 



( 7 ) 



a _ n l r 2 ~ n 2 r l 
q l N_r' - N,r r 



2 1 



12 



a _ n l r 2 ~ n 2 r l 
q 0 N, r_ - N„r 



’ V2 



2 1 



P 3 i N. A 



11 L 

'i «i 



( 8 ) 



N. 



Nr' - N r*" 
2 r l W l r 2 



n l r 2 n 2 r l 



m ^1 t N 2^ n l ~ r l^ ~ M l^ n 2 r 2^ 

"i "l r 2 - n 2 r l 

It can be shown that (8) is indeed a consistent ,estimator of . 
Using this estimator for p A , possible estimators for d , R , and OR 
are ^ ^ 2 , ^ , and OR3 = p^Q - P 32 >/P 32 (l - P n > 

respectively. Note that the estimator of OR , OR 3 , is identical to OR. 
Under Model 3 these estimators have asymptotic normal distributions and 
are asymptotically unbiased and consistent — conditionally and uncondi- 
tionally. The asymptotic conditional and unconditional variances are 
shown in Tables 6 and 7 . 

Notice that the Model 3 estimator for p_^ fails for p^ = ? 

both asymptotic variances are infinite for this case. Basically, for 
P 3 = P 2 ■ P there is insufficient information in the samples to estimate 
P » » and . Thus we may not be able to obtain reasonable estimates 

of p^ and P2 using this procedure in cases where p^ is close to P2 
To illustrate, consider the case N^ ■ N 2 . When n^^ « n 2 , then 
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TABLE 6 

Asymptotic Conditional Variance Under Model 3 



Estimator 



Variance 






9 l (1 - e i) 

T J«2- 9 1> 4 



U 



77 [ WV + Vi + 9 z 2 - 2 W 2 1' 



+ >7J I9 1 9 2 a-8 1 )(l-0 2 )(T 1 -T 2 ) 2 J ^ 






V2< 9 2- 9 1, 4 t0 2°' 9 2 )(I l- T 2) 2 + <V 9 1>M> 



0 2 ( 1-9 2 
C1-X)T, 



- [ 0 1 ( i - 9 1 )( t 1 - t 2 ) 2 + ( Sj - e^ 2 ^) 2 ) 



•' s T ni I R 3 



4) 2 i_ 

V T ' 2 

2 0* 



9 i (1 - 9 i ) , A,2 

T 1 X 0 2 T 2 (1_X) 



A + N 2 0R 3 



e ^ l - Qi ) 2 



J-- 0 2 0! 

T^ci-e^ + t 2 (i-x)e 2 



where 



6, - E( -±) 



p i q l 



1 n i p i q l + (1 - Pi> q o 



P 3i “ p li ‘ Note » however, that if r 2 /n 2 = r^ , ^ = 0 and is 

undefined. If n ± - n 2 - ^ - r 2 then ^ = (n^ - n^J/N'O yielding 
Pj “ 0 » another nonsense estimate. Even worse, and may both be 

negative; this will occur if r ± - - n 2 ) < r 2 < n 2 r r or 

n 2 r l 

— < r 2 < r x - («!- n 2 ) . 
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TABLE 7 

Asymptotic Unconditional Variance Under Model 3 



Estimator 



Variance 



> 'V r5 I $31 



< / *V ni I' J 3 



I PdCi-pJqn tPo “ ^(Po-Pn)] 
j i 0 2 2 1 

+ Pid-pJqi td-p 2 ) + x(p 2 - Pi )1 
i i 1 2 1 

- PiU-P^q^o 



1PiP 2 q 0 Ip 2 • x( p 2 ‘Pi )1 

+ <i-p 1 )<i-p 2 )q 1 U-p 2 + Mp 2 - Pi )] 
- q 0 q 2 (i-p 1 -p 2 ) 2 



^1*3 



v^TnJ 0R 3 



p 2 q l^ <1 - ^ 

Pia-p 2 > 



[ (1 - *)P 2 + *P-l - PjP^] 



xd-Dq^p^a^) 3 



P 2^ 1_P 2^ P l q l + (1-p l^ q 0^ 



+ Pl (l- Pl ) [p 2 q 1 + (1-P 2 ) q Q ] X 



This same problem is reflected in the behavior of the maximum likeli- 
hood estimators for Model 3. When P^ " P 2 » ormation matrix is 

singular. For P^ ^ P 2 » numerical comparisons for parameter values 

2 A /V 

listed below indicate that the asymptotic variances of p^^ , d^ , R^ , 

A 

and OR^ are identical with those of the ML estimators of p^^ , d , R 
and OR . 



^Variance ratios were evaluated for p. = .1, .25, .50, .75, .90 ; 
P 2 “ «1» .25, .50, .7S*j-' .90 ; » .5, .75, .90, 1.0 ; qg ■ .5, .75, 

.90, 1.0 . 
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Detailed investigations of the behavior of the Model 3 estimators 
in large and small samples are reported in Sections 9, 10, and 11, while 
in Section 12 the testing of H Q : ?1 = ^ is discussed. 

— — - ot: ^ Variables Influence Missing Data (Model 4); 

.Statistical Techniques for Problems (a) Throuph 

In Model 4, the probability that a particular observation is missing 
depends on both the value of x , the independent variable, and the value 
of y , the dependent variable. Therefore, the probability that a par- 
ticular y observation is missing is different for each of the four x,y 
combinations. Without further assumptions or additional information, it 
is impossible to obtain consistent estimators of the p± . No detailed 
studies of problems (a) through (e) were carried out for Model 4 since 
entirely new problems arise when this model holds. The following are 
four possible lines of attack. 

(a) Assumptions can be made about relationships among the four 
probabilities q(x,y) which would allow the use of techniques 
obtained for Model 2 or Model 3. For example, assume that missing 
observations are twice as likely in population 1 as in population 2. 

(b) Estimates of the probabilities q(x,y) may be obtained by a 
pilot study or intensive subsampling of nonrespondents (see e.g., 
Cochran, 1963). 

(c) Use of some related variable z can be made. For instance, 
if a dichotomous variable z affects the probability distribution 
of y but does not influence q(x,y) , then Eklund (1959) has 
developed consistent estimators of the . 
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(d) Estimators based upon Models 1, 2, and 3 could be employed if 
the magnitude of the biases when Model 4 holds were ascertained and 
the corresponding standard error formulae changed. That is, a 
robustness study could be made to find out the conditions under which 
these Model 1, Model 2, and Model 3 estimators give reasonable results. 
This point will be discussed in later sections. 



Model 1 and Model 2 estimators of the p^ are the same, the estimation 



under Models 1 and 3. How much is lost if it is assumed observations 
were missing at random, if in fact qQ ^ ? How much is lost by 



questions it is necessary to examine asymptotic unconditional results 
for the bias, variance, and mean-squared error of the Model 1 and 
Model 3 estimators of p^ under Model 1 and Model 3. Since comparisons 
between p^ and p^ are the major interest, small sample work is 
reported only for d , R , and OR (see Sections 9, 10, 11, and 12). 

The Model 1 and Model 3 estimators for p^ are 



8. Estimators of the p^ 



In this section the concern is only with how well the p^ are 
estimated and not with how to estimate the variance of p^ . Since the 



problem is reduced to a comparison of the behavior of p^ and p^ 



using the Model 3 estimators even though = ^1 ' To answer these 




1 



^31 ^ N x 




N 2 (n l ~ r l ) “ N l (n 2 “ r 2 } 
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The estimator p 31 is asymptotically unbiased with conditional and 
unconditional asymptotic variances given in Tables 6 and 7. Results 
for Pll under Model 3 are given in Table 8. 



TABLE 8 



Asymptotic Behavior of p n Under Model 3 



E «n> 



[exact] 



Bias (p n ) 



(q l“ q 0 ) Pi^gj) 

p i q i + ( 1 -Pi>q 0 



[exact] 



Var ^ + N 2 p^ 



'11 

conditional 



0! 



V 



[exact] 



unconditional 



9 I 

p l q l x 



Suppose Model 1 is true and » q Q * q , how much is lost by 

using the Model 3 estimator of ?1 ? For simplicity, let P 2 - P;L + A 

and - N 2 * N . Then under Model 1 both estimators are asymptotically 

unbiased and the conditional variance formulas for t> and o 

11 F 31 

become 



A Pid-Pi) 

Var ®u> - -Sr 1 - 



2 Pitt-Pi) *, 

Var (p 31 ) » - — [A /2 + (l-q^Cl-p.,) ] , 

qA ± ± 
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yielding 



Var ($ 31 ) 
Var (p u ) 



= 1 + 



2(l-q)p 1 (l-p 1 ) 



Under Model 1 then, p^ always has a larger variance than p^ and 
gets worse in comparison with p^ as p^^ approaches 0.5 , as q 
approaches zero (the proportion of missing data increases) and as 
A = p 2 - approaches zero. 

Under Model 3, the asymptotic unconditional formulas for mean- 
squared errors are: 

9 1 2 2 

MSE (p n> = -T f(q r q 0 ) 2 ( 1 _p i )2 + -w&r* 



i p i 



N, 



MSE 



p l (1 - p l> 



P 2^ 1_P l^ q 0^ P 2 “ N 1 +N 2 (p 2 _P l^ 



N, 



(p 31 ) - (N 1+ H 2 ) 2 ( + + tT+iT 

N l"2 q lV p 2' ! ’l ) 1 2 

^ p i^ 1 - p i )q i q o 

As A approaches zero, MSE (p^j) will be smaller than MSE (jL^) • 
However, for p^ ^ p 2 and N large, the bias in p.. , which increases 
with l q i~ q ol will make p^ preferable. In small samples, p„ is 
biased and may have a larger variance than asymptotic results indicate. 

9. Comparisons of Model 1 and Model 3 Estimators of d 
In this section the unconditional asymptotic and exact small sample 

A A 

behavior of estimators and d^ under Models 1 and 3 are compared. 



£5 



Model 1 and Model 3 estimators of d = p^ - p 2 are: 






(“ 2 (n i - * » 1<«2 - r 2 > \ 


W M 2 1 


\ n l r 2 " n 2 r l / 



Results of the comparison indicate that the Model 1 estimator will 
be preferaole for " P 2 » for q Q = and for small N (N £ 50) . 

*' or q 0 * q l ’ l p l“ p 2l ^ ® » d 3 wil1 look better for large N . 

Next the three situations p^ = P 2 » q i = q o * an< * t * ie 8 eneiral case of 
Model 3 are discussed by comparing asymptotic results and by examining 
exact bias and mean square error for samples of ^ * N 2 ® 20, 50 
The Model 3 estimator, d^ , is asymptotically unbiased with 
conditional and unconditional variances given i<i Tables 6 and 7. The 

A 

behavior of d^^ under Model 3 is given in Table 9. 



TABLE 9 

Asymptotic Behavior of d. Under Model 3 



E (d 1 ) 



9 1 ' e 2 



Bias (d ) 



Var vfo + N 2 d^ 
conditional 






p l^ 1-p l^ 



P 2 (1 “ P 2^ 



p l q l +(1_p l )q 0 p 2 q l +(1_p 2 )q 0 



0 1 (1_0 1 ) e 2 (1 “ e 2 ) 
T 1 X + T 2 (1-X) 



unconditional 



9i(i-e 1 ) e 2 (i-e 2 ) 



Xp i q i d-x)p 2qi 



[exact] 

[exact] 



[exact] 



Exact unconditional results for bias, variance and mean-square error 



A A 



were obtained for d^ and d^ for 


N 1 


= N 2 


II 

to 

O 

w 


50 


, for 


400 sets 


of parameter values p^, p 2 = .10, . 


25, 


.50, 


.75, 


.90 


; q r 


q o s - 50 ’ 



.75, .90, 1.0 . Results are summarized in Tables 10, 11, 12, and 13. 
Notice that except for sign changes in the bias, results for p^, p 2 
are identical to results for p 2 , p^ and, with qQ, reversed, to 
results for 1-p^, l-p 2 ant * 1~P2» l”Pi * Results were obtained 

A 

conditional on ^ ^ 0 , n 2 ^ 0 ; for n^ = n^ d 3 was defined to 
be 0 . 

When p^ = p 2 , both estimators are unbiased in large and small 
samples. The asymptotic unconditional variances of d^ and d^ 
respectively become 

q i q o p(1-p) 

X(l-X) , . t - . *3 

(P^ + (l-p)q 0 ) 

and T^xa-x) tq o p3 + q i (1 “ p)3 - q o q i (1 “ 2p ) 2 ] • 

Table 10a shows the ratio of the unconditional asymptotic variance 
formulas for several values of p , q^ , and q^ . (Note that the con- 

A 

ditional variance of d^ is infinite for •) T h e ratio is 

always less than 1.0 , indicating that for P^ = P 2 » d. is t0 be 
preferred. Table 12a shows the exact ratio; d^ is even more strongly 
preferable in small samples. 

When q i “ ’ that is » when Model 1 obtains , d^ is unbiased in 

A 

large and small samples; d^ is unbiased in large samples but has bias 
ranging from .001 to .075 in absolute value for samples of size 20 
and from .001 to .045 for samples of size 50 (see Table 11c). The 



bias ranges up to 39 and 26 percent of d for samples of size 20 
and 50 respectively. The asymptotic variance formulas for = N 2 = N 
are related by 

Var d 3 = Var ^ (1-q) (l-p^-p^ 2 . 

They are equal only for N infinite, q = 1 or p^^ + p 2 = 1 ; otherwise 

A A 

var > var by an amount which increases as q decreases and as 
■** P£ differs from 1 . See Table 10b for ratios of the variances. 
Table 12b shows the ratio of exact mean-squared errors for N = 20, 50 . 
These results favor ^ more strongly than asymptotic comparisons would 
indicate. 

For the general case of Model 3 when i p 2 and q 1 1 q Q , ^ 

A 

is biased and d^ unbiased in large samples. The asymptotic uncondi- 
tional ratio of MSE (c^) to var d 3 is shown in Table 10. These 
asymptotic comparisons indicate that for small samples (N = 20) d^^ is 

preferred for p^ close to p 2 , d 3 is preferred for | Pj_— P 2 1 large. 
For samples as large as 200 , the bias in ^ makes d 3 appear 
preferable except for some cases where Ip^ p 2 1 is small. The exact 

A 

bias in d^ is independent of N and ranges up to .12 in absolute 
value and up to 45% of d for the cases considered; it increases in 
absolute value as Uq^I increases. The absolute bias in d 3 ranges 
up to .06 for N = 20 and .04 for N = 50 ; maximum percentage bias 
is 39 for N = 20 and 26 for N = 50 (see Table 11) . For a given 

A a 

p^, p 2 the bias in d 3 is always one-sided while the bias in d^ may 
be either positive or negative. The bias in d 3 decreases slowly with 
N , with increasing |p 1 ~p 2 | , and wih increasing + ^ • The exact 
ratio of unconditional mean-squared errors (Table 12) generally favors 




T 



i- 



. 



t:- 

b: 

I 






' O 

ERIC 






22 



TABLE 10 

A A g 

Ratio of Asymptotic Unconditional Formulas for MSE and MSE d^ 

MSE d. 



MSE d„ 



a) When = P£ » tbe rat i° is independent of N , both estimators are 



b) 



c) 



asymptotically unbiased. (For q 
b 

p l p 2 


'0 ■ "l * 1 ’ 
Min 


the ratio is 1.0 
Max 


.10 


.10 


.220 


.926 


.25 


.25 


.600 


.962 


.50 


.50 


.790 


.994 


When flg “ q l * t * ie ratio is independent of 


N , both estimators 


asymptotically unbiased. (For = 1 , 

*0 " q l * 1 


the ratio is 1.0 


p l 


p 2 


Min 


Max 


.10 


.25 


.396 


.766 




.50 


.680 


.914 




.75 


.925 


.984 




.90 


1.000 


1.000 


.25 


.50 


.875 


.972 




.75 


1.000 


1.000 


A 

For flg ^ ^1 * ^3 asymptotically unbiased. 




N = : 


20 


N = 200 


T3 

h-» 

na 

ro 


Min 


Max 


Min Max 


.10 .25 


.474 


.961 


.708 3.169 


.50 


.696 


1.712 


1.033 8.636 


.75 


.789 


1.896 


1.002 7.452 


.90 


1.005 


1.284 


1.006 2.075 


.25 .50 


.757 


.995 


.812 2.155 


.75 


.985 


.999 


.999 1.585 


a Formulas evaluated for p^, P 2 
q. of .5, .75, .90, 1.0 . 


of .1, .25 


, .50, .75, .90 • 



^Due to symmetries in the formulas, all other cases in P^» P 2 
reduce to those shown. 
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TABLE 11 

Exact Unconditional Bias of d , d for N = N a 

13 X 2 



For Pl = p 2 


, both 


A A 

d^ and d^ 


are unbiased 


for all 


N . 


The bias in 


/\ 

d^ is independent of 


N . For 


q o ■ “1 • 


d^ is 


unbiased. For q^ £ 


q l ! 












Bias 


/s 

d. 


100 


Bias 








1 


d 




p l 


p 2 


Min 


Max 


Min 


Max 


.10 


.25 


-.0681 


.0597 


-45 


39 




.50 


-.0848 


.1191 


-21 


29 




.75 


-.0271 


.1025 


4 


15 




.90 


.0008 


.0344 


.1 


4 


.25 


.50 


-.0179 


.0594 


-7 


23 




.75 


.0010 


.0428 


.2 


9 



c) For dg : 

Bias d^ 

N = 20 N - 50 

q o = q i * 1 q 0 * q i q 0 = q i * 1 % * 1 



p l 


p 2 


Min 


Max 


Min 


Max 


Min 


Max 


Min 


Max 


.10 


.25 


.0094 


.0557 


.0022 


.0593 


.0052 


.0391 


.0009 


.0393 




.50 


.0059 


.0579 


.0014 


.0511 


.0021 


.0203 


.0005 


.0170 




.75 


.0027 


.0288 


.0009 


.0222 


.0010 


.0094 


.0003 


.0074 




.90 


.0014 


.0143 


.0007 


.0093 




— 


— 


— 


.25 


.50 


.0115 


.0746 


.0042 


.0598 


.0050 


.0453 


.0019 


.0352 




.75 


.0051 


.0542 


.0026 


.0359 


.0018 


.0176 


.0009 


.0116 



cl 

Exact unconditional results obtained for P, , P 2 ■' .10, .25, .50, 

.75, .90; q_, q. = .50, .75, .90, .999 . Due to symmetries in the dis- 

tribution, all other cases reduce to those shown with possible sign changes. 



30 



24 



TABLE 11 (continued) 

100 Bias 
d 

N = 20 N = 50 

q 0 = q l + 1 q 0 * q l q 0 = q l * 1 q 0 * q l 



p l 


p 2 


Min 


Max 


Min 


Max 


Min 


Max 


Min 


Max 


.10 


.25 


6.2 


37 


1.4 


39 


3.4 


26 


.6 


26 




.50 


1.4 


14 


.4 


21 


.5 


5.0 


.1 


4.2 




.75 


.4 


4.4 


.1 


3.4 


.2 


1.4 


.0 


1.1 




.90 


.2 


1.7 


.9 


1.1 


— 


— 


— 


— 


.25 


.50 


4.6 


29 


1.6 


23 


2.0 


18 


.8 


14 




.75 


1.0 


10 


.5 


7.1 


.4 


3.5 


.2 


2.3 



d^ except for some cases where | p^— p 2 1 is large and N = 50 . Gener- 
ally the ratio tends to increase as q^, increase; that is, 3^ 
looks worse as the proportion of missing data increases. 

Table 13 gives the ratio of the exact to the asymptotic uncondi- 
tional variances for 3^ and 3^ for N « 20 and N = 50 . For N 
as small as 20 , the asymptotic variance formula is quite close to the 

A 

exact variance for d^ ; for d^ the asymptotic formula does not provide 

/\ 

a reasonable approximation. For p^ = p^ , the exact variance of d^ 
goes up with N , and for p^ close to p^ , the exact variance does 
not decrease as fast as 1/N . Generally the ratio of exact to asymp- 
totic variance is largest for q^ or q^ small as would be expected. 
Note that the ordinary estimator of the conditional variance of 3^ 

• , ■ V - 

should be a good estimate of its conditional variance under Model 3. 

In summary, for p^ ® or <U = Qq , or N small to moderate, 
d^ is the preferred estimator. For . N large, p.-^ ^ ® » an< * <l]_ 
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TABLE 12 

Exact Ratio of Unconditional Formulas for MSE and MSE d^ a 

MSE d. 



MSE d. 



and d 3 are unbiased. The ratio increases 



For p 1 = p 2 , both 
as q 0 * q l increas 6‘ for q Q = qi = 1 , the ratio is 1.0 . 

N - 20 N = 50 

Pi Po Min Max 



.10 

.25 

.50 



.10 

.25 

.50 



.07 

.06 

.07 



.95 

.93 

.73 



b) For q n " f 1 , d. is unbiased. 



Min 



Max 



.02 

.02 

.02 



.97 

.67 

.43 




c) 



For q 0 * q i : 
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and q Q known to be unequal, d 3 may be employed. In other words, 
unless it is reasonably sure that Model 3 pertains and p^ f p^ , more 
will be lost than gained by using d^ . 



TABLE 13 

Ratio of Exact to Asymptotic Unconditional Variance of d s 
(Excluding q Q - q x - 1 for Which Ratio Is 1.0 ) 



N = 20 



N = 50 



p l p 2 



Min 



Max 



Min 



Max 



Min 



Max 



Min 



Max 




The estimators of the ratio p^^ 



. _ fi *2 
^ " r 2 n l 
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and 



. r N 

R~ = — -r~- . 

3 r 2 N l 



In this section the unconditional asymptotic and exact small sample 
behavior of and R 3 under Models 1 and 3 are compared. Results 

show that for = P 2 > q o = q l » or N small to moderate, R is 

moderately preferable to . 

The Model 3 estimator, R^ , is asymptotically unbiased with condi- 
tional and unconditional variances given in Tables 6 and 7. The behavior 

A 

of R^^ under Model 3 is given in Table 14. 

TABLE 14 

Asymptotic Behavior of ^ Under Model 3 






11 

m 



I 

if 

- as 



E (R^ 



Bias (R^) 



( V q O )(p 2" P l )0 l 



P 2 q l 



Var + N 2 



conditional 



2 L 



9 1^ 9 1> + f 9 !) 2 0 2 (1 “ 9 2 ) 



T 1 X 



t 2 (1-A) 



unconditional 



9^0 [P2d-P 1 )9 1 (1-X) + p^(l-p 2 )e 2 A] 

0 2 q l p l p 2 (1 “ X)X 



Under Model 3, the conditional variances of R^ and R^ have the 



ratio 
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Var R, 



Var R, 
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which in large samples will be approximately 



q 0 + £1 (yV 

q 0 + p 2 (qrq 0 ) 



and consequently will be greater or less than 1.0 for P^/P 2 greater 
or less than 1.0 . 

Exact unconditional results for the bias, variance, and mean square 

A A 

error were obtained for and for = ^ = 20, 50 , and for 

400 parameter sets in p^, P 2 , q^» qQ . Results were obtained conditional 
on n^ ± 0 , n 2 t 0 , and are summarized in Tables 15, 16, and 17. 

A A 

For p^ = p 2 , both R^ and R^ are asymptotically unbiased. In 
small samples the range of the bias is generally comparable for the two 
estimators although always slightly less for R^ than for R^ (see 
Table 16a) . The biases are generally positive and range up to 30% of 
R ; the biases decrease as p^ , P 2 increase. 

The ratio of the asymptotic unconditional variances is 



Var 

Var R 3 



i-e 

i-pq^L 



which is always less than 1.0 except for q^ = q^ = 1 . The ratios 
have been evaluated in Table 15a. The exact ratio of mean-squared errors 
is shown in Table 17a and is quite similar to asymptotic results even for 
N = 20 . Therefore, for p^ = P 2 , the estimator R^ is clearly prefer- 

A 

able to R- . 

When Model 1 is true and q^ q^ but p^^ ^ P 2 , both R^ and R^ 
are asymptotically unbiased. The biases in R^ and & 3 are usually 
positive and show very similar ranges. The percentage bias depends only 
on P 2 and decreases as P 2 increases (see Table 16). The ratio of the 



T 



"PT 
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asymptotic unconditional variances is 

Var p^Cl-X) + Xp^^ - PjP 2 

Var R 3 P2( 1-X ) + X Pi “ qP]P 2 ~ 1,0 

This ratio is evaluated in Table 15b. The exact unconditional ratio of 

A A 

MSE to MSE is shown in Table 17b. The small sample comparison 

A 

favors somewhat more than the asymptotic results. Therefore, under 

A 

Model 1 is to be preferred, although for p 2 small, the gain in 

A 

using R^ may be relatively small. 

Under Model 3, when q 1 > q Q and p^^ * p 2 , R 3 is asymptotically 
unbiased and R^ is asymptotically biased. Except for p 2 small, R 3 
shows a smaller range for exact bias and its bias decreases with increas- 
ing N and increasing q^^ (it is almost unaffected by q Q ). The ratio 
of asymptotic unconditional mean-squared errors is shown in Table 15c. 

For an N as small as 20 there is no clear-cut choice between R^ and 

A A 

R 3 ; by N = 200 R 3 is clearly preferable. The small sample results 
for N » 20 shown in Table 17c are quite similar to those obtained using 
asymptotic formulas. Although R 3 improves with N , exact results do 
not clearly favor either estimator, even for N as large as 50 . 

The ratio of exact to asymptotic variance is quite similar for R^ 

A 

and R 3 . The exact variance is generally larger except for p^ = p 2 
and N = 20 . For N = 50 , the ratios vary from 1.0 to 3.7 , being 

close to 1.0 for R < 1 and larger for R > 1 . 

In conclusion, then, for p 1 ■ p 2 , » or N sma11 to 

A A 

moderate, R^ is moderately preferable to R 3 . For N large, p^ # p 2 , 
qi ^ q© » ^3 * s preferable to R^ . For other situations, the choice 
depends on the parameter values . 
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TABLE 15 

/V /V 

Ratio of Asymptotic Unconditional Formulas for MSE and MSE R^ 

MSE ij. 

MSE R 3 



A A 

a) When “ P 2 » both R^ and R 3 are asymptotically unbiased and 
the ratio is independent of N . (For = 1 , the ratio 

is 1.0 .) 



p l 


p 2 


Min 


Max 


.10 


.10 


.91 


1.00 


.25 


.25 


.80 


.99 


.50 


.50 


.65 


.96 


.75 


.75 


.40 


.92 


.90 


.90 


.18 


.91 



b) When q^ = q^ * t * ie ratio is independent of N . For q^ = q^ = 1.0 , 
the ratio is 1.0 . The ratio is symmetric in P^ > P 2 * 



% = q l * 1 



p l 


P 2 


Min 


Max 


.10 


.25 


.92 


.98 




.50 


.91 


.98 




.75 


.90 


.98 




.90 


.90 


.98 


.25 


.50 


.80 


.95 




.75 


.77 


.94 




.90 


.76 


.94 


.50 


.75 


.57 


.87 




.90 


.53 


.85 


.75 


.90 


.31 


.69 



c) For q^ ^ q^ , R^ is asymptotically unbiased. 

N = 20 N = 200 



p l 


P 2 


Min 


Max 


Min 


Max 


.10 


.25 


.85 


1.17 


.89 


1.45 




.50 


.65 


1.91 


.88 


4.29 




.75 


.53 


3.02 


.95 


9.75 




.90 


.49 


3.86 


.96 


14.31 



« 
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TABLE 15 (continued) 

N = 20 N = 200 



p l 


p 2 


Min 


Max 


Min 


Max 


.25 


.10 


.71 


1.18 


.84 


1.23 




.50 


.74 


1.30 


.90 


3.10 




.75 


.65 


2.48 


1.01 


11.12 




.90 


.62 


3.56 


1.08 


19.21 


.50. 


.10 


.62 


1.64 


.91 


2.22 




.25 


.65 


1.35 


.91 


1.92 




.75 


.61 


1.29 


.90 


5.04 




.90 


.58 


2.33 


1.07 


13.84 


.75 


.10 


.61 


2.50 


.95 


4.85 




.25 


.64 


2.16 


1.03 


5.48 




.50 


.54 


1.32 


.94 


3.53 




.90 


.39 


0.99 


.64 


3.97 


.90 


.10 


.59 


3.38 


.98 


8.09 




.25 


.67 


3.08 


1.12 


10.73 




.50 


.58 


1.98 


1.16 


8.38 




.75 


.35 


0.90 


.71 


3.28 



TABLE 16 

Exact Unconditional Bias for ^ , R 3 as a Percent of R a 



a) For R 3 , which is asymptotically unbiased, the percentage bias is 
independent of p^ (the range is only slightly larger for ^ 
than for q Q = q 1 ). 



N = 20 N = 50 



P 2 


Min 


Max 


Min 


Max 


.10 


-23 


16 


24 


31 


.25 


22 


28 


7 


20 


.50 


6 


22 


2 


7 


.75 


2 


11 


1 


4 - 


.90 


1 


8 


— 


— 



^Excluding q Q = q^^ = 1.0 



"'.'Si . 
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TABLE 16 (continued) 

b) For qg - q^ £ 1 , is asymptotically unbiased and the percent- 

age bias is independent of p^ . 

N = 20 N = 50 



p 2 


Min 


Max 


Min 


Max 


.10 


-23 


10 


23 


30 


.25 


20 


26 


8 


18 


.50 


7 


15 


2 


5 


.75 


2 


4 


1 


1 


.90 


1 


1 


— 


— 



c) For q Q ^ q A is asymptotically biased. 



N = 20 N = 50 



p l 


P 2 


Min 


Max 


Min 


Max 


.10 


.10 


-23 


15 


24 


30 




.25 


13 


33 


8 


20 




.50 


-5 


40 


-15 


40 




.75 


-31 


62 


-31 


62 




.90 


-45 


72 


— 


— 


.25 


.10 


-23 


13 


9 


36 




.25 


18 


27 


6 


20 




.50 


4 


24 


-8 


22 




.75 


-24 


42 


-27 


39 




.90 


-37 


55 


— 


— 


Oi 

o 


.10 


-23 


15 


-9 


57 




.25 


-2 


44 


-12 


40 




.50 


4 


20 


1 


6 




.75 


-11 


18 


-15 


15 




.90 


-25 


27 


— 


— 


.75 


.10 


-29 


20 


-22 


89 




.25 


-16 


72 


-24 


68 




.50 


-11 


43 


-13 


27 




.75 


2 


7 


0 


2 




.90 


-10 


8 


— 


— 


.90 


.10 


-34 


33 








.25 


-23 


96 








.50 


-14 


63 








.75 


-7 


22 








.90 


0 


3 
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TABLE 17 

Exact Unconditional Ratio of MSE ^ to MSE R 3 



a) 



For p 3 = p 2 (excluding q 3 = q Q = 1) 



l 



V- 

1 , : '■ 

t-' 



N = 20 



Min 



Max 



N = 50 



Min 



Max 




b) 



For 



y 



q 0 “ q l (for q o " q i ■ 1 » the ratio is 1.0 ): 



N = 20 



N = 50 




40 
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TABLE 17 (continued) 



For q n 4 q 



1 * 







N = 


20 


N = 


50 • 


Pl 


p 2 


Min 


Max 


Min 


Max 


.10 


.25 


.83 


1.07 


.81 


1.20 




.50 


.56 


1.86 


.63 


2.30 




.75 


.41 


3.05 


.62 


4.14 




.90 


.39 


3.94 


— 


— 


.25 


.10 


.67 


1.08 


.62 


1.23 1 




.50 


.63 


1.26 


.69 


1.61 ; 




.75 


.46 


2.47 


.68 


3.91 \ 




.90 


.47 


3.58 


— 


\ 


.50 


.10 


.52 


1.26 


.43 


1.91 




.25 


.46 


1.43 


.57 


1.46 




.75 


.45 


1.29 


.59 


1.92 




.90 


.41 


2.32 


— 


— 


.75 


.10 


.56 


1.60 


.38 


3.24 




.25 


.38 


2.35 


.51 


2.45 




.50 


.41 


1.32 


.53 


1.62 




.90 


.29 


.99 


— 


— 


.90 


.10 


.61 


2.27 








.25 


.36 


3.34 




l 




.50 


.38 


1.86 








.75 


.30 


.87 




; 



11. The Estimation of the Odds Ratio OR 
The Model 1, Model 2, and Model 3 estimators of OR all reduce to 



OR 



r l <n 2 - r 2 ) 



r 2 ( ”l ‘ r l > ' 

This estimator is asymptotically unbiased under all three models with 
asymptotic unconditional variances tinder the three models given in 
Table 18. 
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TABLE 18 

Asymptotic Unconditional Variance of OR 



Model 

1 

2 

3 



Variance 



P 1 (1 " P 2^ 

Kl-Dqp^Cl-Pj ^) 3 



[p 2 (l-p 2 )(l-X) + Pl (i- P;L )x] 



P l (1-P 2^ 

xa-xjp^i-pj^)^^ 



lp 2 d-P 2 )q 2 d-X) + p^i-p^q^X] 



p i (1 -p 2 ) 

xd-xjq^p^i^) 3 



t p 2^ 1-p 2^ p i q i + (1-P^qg] d-X) 
+ p^l-p^lp^ + (l-p 2 )q 0 ]X] 



Alternatively, the asymptotic variances are given by 



p id- p 2 > 

xd-xjp^i-p^ 3 



fd) 



where 



P-i d~p 1 )X p-d-p,)d-X) 
fd) = — — + — 



p. d-p, )x p. d-p,) (i-x) 

f (2) = — + -2 1 

q 2 q l 



f (3) = 



p id“ p i>x p 2 p 2 d-p 2 )d-x) Pj^ 

~ + v ®T 



This independence of the form of the estimator from q(x,y) suggests 

A 

that the use of OR will be robust to q(x,y) . Further investigation of 

A 

OR is then in order. 
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Under the variance formulas reduce to 



1 

q 



Model 1 



X(l-X)p(l-p) 



X , (1-X) 



_E_ 



Model 2 



Model 3 



Table 19 shows the exact bias in OR under Model 3 for = Nj = 20 
Table 20 shows the performance of the asymptotic variance formula for 
^ = 20 . Generally the bias is of the order of 20% to 50% of 
OR , although it does not contribute appreciably to MSE . This suggests 

A 

a modification of OR to reduce bias along the lines suggested by 
Anscombe (1956) and Gart & Zweifel (1967) for estimating the logit. The 
exact behavior of OR does not seem to depend particularly on |p^ - p^ | 

hi - q 0 l • 

TABLE 19 

Exact Unconditional Bias of OR for ■ N_ = 20 a 



or 



K 


Bias 




100 


Bias 

OR 


Pi p 2 


Min 


Max 


Min 


Max 


.10 .10 


-.195 


.331 


-20 


33 


.25 


.116 


.176 


39 


59 


.50 


.0207 


.0522 


23 


58 


.75 


.00505 


.0126 


19 


47 


.90 


.00151 


.00363 


19 


45 


a For q Q , q x = .5, .75, .90, 1. 


0 . 







For the other cases in p^, P 2 , note that p^, P 2 is equivalent 
to (I-P 2 ), (1-p^) with the q's reversed. 
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TABLE 19 (continued) 



Bias 100 Bias 




TABLE 20 

Ratios of Exact to Asymptotic Variance and MSE a 
for OR for ■ = N 2 = 20 










;r- 

:-f •• 
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12. Conclusions: Tests and Confidence Intervals 

Under Models 1, 2, or 3 

A test of H Q : may be carried out using the Irwin-Fisher 

exact test for the 2x2 table of r^ and n^ - r^ , conditional on n^ , 
n 2 , and r^ + . The tabled significance values and the power function 

will be correct under all three models. If Model A obtains, an accurate 
test of p^ = p£ cannot be performed without additional information. 

The major issues in point and interval estimation are the choice of 
an estimator and the calculation of a variance. For the estimation of 

A 

d , d^ is the estimator of choice for Models 1 and 2, and, though 
biased may be useful for Model 3 unless N is larger than 50 and 
and p£ are known to be widely different. The ordinary estimator of the 

A 

conditional variance of d^ should perform well under all three models. 

To estimate R , use in Models 1 and 2; under Model 3 the 

A A 

choice between R^ and R^ depends strongly on the values of the 
parameters. Modification of these estimators to reduce bias is of 
interest. It is common to base confidence intervals on the large sample 

A 

normal distributions of R . In small samples the large sample standard 
error is biased. In addition, it may be sensible to estimate the large 
sample conditional variance formula for Model 3 by substituting r^/n^ 
for the 0^ , but there is no good estimator for the p^ and of the 

unconditional formula. 

To estimate OR , OR (or a modification to reduce bias) can be 
used under all three models. Uniformly most-accurate confidence intervals 
can be constructed for OR using the noncentral distribution of r^ , 
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r 2 conditional on (r^ + r^) , (see Lehmann, 1959). This non- 

central distribution is the same under all three models. 

In conclusion, then, the effect of different models for missing data 
depends on the inference problem at hand. Choice of a test for H Q : 

'* P 2 and an estimator for OR is the same for Models 1, 2, and 3. 
Estimation of p and d and R is the same for Models 1 and 2 but 
may be difficult for Model 3. Under Model 4, additional information is 
necessary . 
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